Skip to content

Fix Energon data loading incompatibility with updated Qwen3-VL finetuning pipeline#2680

Open
aub123 wants to merge 1 commit intoNVIDIA-NeMo:mainfrom
aub123:fix/energon-qwen3vl-loader
Open

Fix Energon data loading incompatibility with updated Qwen3-VL finetuning pipeline#2680
aub123 wants to merge 1 commit intoNVIDIA-NeMo:mainfrom
aub123:fix/energon-qwen3vl-loader

Conversation

@aub123
Copy link

@aub123 aub123 commented Mar 6, 2026

Background

The data loading logic for Energon-format datasets is not compatible with the updated Qwen3-VL finetuning pipeline.

Recent updates changed the expected multimodal sample structure, which causes mismatches when loading Energon datasets.

Changes

  • Adjust Energon data loading logic in qwen3_vl_bridge.py
  • Align multimodal sample parsing with the updated Qwen3-VL finetuning interface
  • Ensure Energon datasets can be used directly in the current training pipeline

Notes

This change focuses on compatibility with the new Qwen3-VL finetuning logic and does not modify the Energon dataset format itself.

Tested on Qwen3-VL-8B-Instruct model.

Summary by CodeRabbit

  • Documentation

    • Enhanced README with improved formatting, structure guidance, and inference instructions for Qwen VL model examples
  • Bug Fixes

    • Corrected image file extension mapping in Energon dataset configuration

@copy-pr-bot
Copy link

copy-pr-bot bot commented Mar 6, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

aub123

This comment was marked as off-topic.

…ng pipeline

Signed-off-by: aub123 <2546319206@qq.com>
@aub123 aub123 force-pushed the fix/energon-qwen3vl-loader branch from 2530b0e to 9603399 Compare March 6, 2026 09:58
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 6, 2026

📝 Walkthrough

Walkthrough

Updates image and video handling in a Qwen VL model example by introducing a new tensor conversion method in the task encoder and correcting the file extension mapping in the dataset documentation from jpg to jpgs.

Changes

Cohort / File(s) Summary
Documentation
examples/models/vlm/qwen3_vl/README.md
Added formatting blocks (directory structure, import/export steps, notes) and changed field_map entry from imgs: jpg to imgs: jpgs for correct image extension mapping.
Task Encoder Implementation
src/megatron/bridge/recipes/qwen_vl/data/energon/task_encoder.py
Introduced private method _convert_to_tensor in videohandler class to handle numpy array conversion to CHW float32 tensors and raw byte fallback; refactored image and video decoding logic to use the new conversion method.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Test Results For Major Changes ⚠️ Warning PR contains significant changes to Energon data loading pipeline and tensor conversion logic, but provides no documented test results, metrics, or verification details despite claiming testing was performed. Add detailed test results to PR description including loss curves, convergence metrics, specific test cases, and performance comparisons on Qwen3-VL-8B-Instruct model to demonstrate compatibility without regression.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Fix Energon data loading incompatibility with updated Qwen3-VL finetuning pipeline' directly and accurately summarizes the main change: updating Energon dataset loading to work with the updated Qwen3-VL pipeline, which aligns with the documented PR objectives.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Tip

Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs).
Share your feedback on Discord.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
src/megatron/bridge/recipes/qwen_vl/data/energon/task_encoder.py (2)

216-216: extensions_mapping is now unused.

The self.extensions_mapping dictionary is no longer used after this refactor. Previously it was used to map extensions to keys for imagehandler, but now _convert_to_tensor always passes "jpg" directly. Consider removing the unused attribute to avoid confusion.

🧹 Proposed cleanup
     def __init__(self, imagespec):
         self.extensions = ["jpgs", "mp4s", "videos"]
-        self.extensions_mapping = {"jpgs": "jpg", "mp4s": "jpg", "videos": "jpg"}
         self.image_handler = imagehandler(imagespec)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/megatron/bridge/recipes/qwen_vl/data/energon/task_encoder.py` at line
216, Remove the unused attribute self.extensions_mapping from the TaskEncoder
class: locate where it is defined (self.extensions_mapping = {"jpgs": "jpg",
"mp4s": "jpg", "videos": "jpg"}) and delete that line since _convert_to_tensor
now always passes "jpg" directly and imagehandler no longer relies on it; ensure
no other code references extensions_mapping (search for extensions_mapping)
before committing.

219-231: Add type hints for the new method.

Per coding guidelines, functions should have type hints for arguments and return types. This improves readability and enables static type checking.

💡 Suggested type hints
-    def _convert_to_tensor(self, data):
-        """Convert numpy array or bytes to tensor.
-        
-        The wds conversion script stores images as numpy arrays (HWC, uint8),
-        so we need to handle both numpy arrays and raw bytes.
-        """
+    def _convert_to_tensor(self, data: np.ndarray | bytes) -> torch.Tensor:
+        """Convert numpy array or bytes to tensor.
+
+        The wds conversion script stores images as numpy arrays (HWC, uint8),
+        so we need to handle both numpy arrays and raw bytes.
+
+        Args:
+            data: Image data as numpy array (HWC, uint8) or raw bytes.
+
+        Returns:
+            Tensor in CHW format with float32 values in [0, 1].
+        """
         if isinstance(data, np.ndarray):
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/megatron/bridge/recipes/qwen_vl/data/energon/task_encoder.py` around
lines 219 - 231, Add type hints to _convert_to_tensor: annotate the data
parameter as Union[np.ndarray, bytes] and the return type as torch.Tensor (i.e.,
def _convert_to_tensor(self, data: Union[np.ndarray, bytes]) -> torch.Tensor).
Also ensure typing Union is imported (from typing import Union) and that
image_handler's return type is compatible with torch.Tensor; update
image_handler signature or cast its result to torch.Tensor if needed. This
change applies to the _convert_to_tensor method in task_encoder.py and any
related image_handler definition used by this method.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@src/megatron/bridge/recipes/qwen_vl/data/energon/task_encoder.py`:
- Line 216: Remove the unused attribute self.extensions_mapping from the
TaskEncoder class: locate where it is defined (self.extensions_mapping =
{"jpgs": "jpg", "mp4s": "jpg", "videos": "jpg"}) and delete that line since
_convert_to_tensor now always passes "jpg" directly and imagehandler no longer
relies on it; ensure no other code references extensions_mapping (search for
extensions_mapping) before committing.
- Around line 219-231: Add type hints to _convert_to_tensor: annotate the data
parameter as Union[np.ndarray, bytes] and the return type as torch.Tensor (i.e.,
def _convert_to_tensor(self, data: Union[np.ndarray, bytes]) -> torch.Tensor).
Also ensure typing Union is imported (from typing import Union) and that
image_handler's return type is compatible with torch.Tensor; update
image_handler signature or cast its result to torch.Tensor if needed. This
change applies to the _convert_to_tensor method in task_encoder.py and any
related image_handler definition used by this method.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 598a6015-fbb7-4a34-ac42-89d0aba24cdd

📥 Commits

Reviewing files that changed from the base of the PR and between c15303e and 9603399.

📒 Files selected for processing (2)
  • examples/models/vlm/qwen3_vl/README.md
  • src/megatron/bridge/recipes/qwen_vl/data/energon/task_encoder.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant